首页> 外文OA文献 >Differentially Private Analysis of Outliers
【2h】

Differentially Private Analysis of Outliers

机译:异常值的差异私人分析

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This paper investigates differentially private analysis of distance-basedoutliers. The problem of outlier detection is to find a small number ofinstances that are apparently distant from the remaining instances. On theother hand, the objective of differential privacy is to conceal presence (orabsence) of any particular instance. Outlier detection and privacy protectionare thus intrinsically conflicting tasks. In this paper, instead of reportingoutliers detected, we present two types of differentially private queries thathelp to understand behavior of outliers. One is the query to count outliers,which reports the number of outliers that appear in a given subspace. Ourformal analysis on the exact global sensitivity of outlier counts reveals thatregular global sensitivity based method can make the outputs too noisy,particularly when the dimensionality of the given subspace is high. Noting thatthe counts of outliers are typically expected to be relatively small comparedto the number of data, we introduce a mechanism based on the smooth upper boundof the local sensitivity. The other is the query to discovery top-$h$ subspacescontaining a large number of outliers. This task can be naively achieved byissuing count queries to each subspace in turn. However, the variation ofsubspaces can grow exponentially in the data dimensionality. This can causeserious consumption of the privacy budget. For this task, we propose anexponential mechanism with a customized score function for subspace discovery.To the best of our knowledge, this study is the first trial to ensuredifferential privacy for distance-based outlier analysis. We demonstrated ourmethods with synthesized datasets and real datasets. The experimental resultsshow that out method achieve better utility compared to the global sensitivitybased methods.
机译:本文研究了基于距离的离群值的差分私有分析。离群值检测的问题在于找到少量与其他实例显然相距较远的实例。另一方面,差异隐私的目的是隐藏任何特定实例的存在(或不存在)。因此,异常检测和隐私保护本质上是相互矛盾的任务。在本文中,我们提供了两种类型的差异私有查询,而不是报告检测到的异常值,它们有助于理解异常值的行为。一种是计数离群值的查询,该查询报告出现在给定子空间中的离群数。我们对异常值计数的精确全局灵敏度的形式分析表明,基于规则的全局灵敏度的方法可能会使输出过于嘈杂,尤其是在给定子空间的维数较高时。注意,与数据数量相比,离群值的计数通常相对较小,因此,我们引入了一种基于局部敏感度的平滑上限的机制。另一个是查询,以发现包含大量异常值的top- $ h $子空间。通过依次向每个子空间发出计数查询,可以幼稚地完成此任务。但是,子空间的变化可以在数据维数上呈指数增长。这可能会严重消耗隐私预算。为此,我们提出了一种具有自定义评分功能的指数机制,用于子空间发现。据我们所知,本研究是确保基于距离的离群值分析具有差异性隐私的第一项试验。我们用综合数据集和真实数据集演示了我们的方法。实验结果表明,与基于全局灵敏度的方法相比,out方法具有更好的实用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号